Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank
نویسندگان
چکیده
This paper describes a method of semi-automatically acquiring an English HPSG grammar from the Penn Treebank. First, heuristic rules are employed to annotate the treebank with partially-specified derivation trees. Lexical entries are automatically extracted from the annotated corpus by inversely applying schemata to partially-specified derivation trees.
منابع مشابه
Corpus-Oriented Development of Japanese HPSG Parsers
This paper reports the corpus-oriented development of a wide-coverage Japanese HPSG parser. We first created an HPSG treebank from the EDR corpus by using heuristic conversion rules, and then extracted lexical entries from the treebank. The grammar developed using this method attained wide coverage that could hardly be obtained by conventional manual development. We also trained a statistical p...
متن کاملFrom Linguistic Theory to Syntactic Analysis: Corpus-Oriented Grammar Development and Feature Forest Model
The goal of this thesis is to establish a system for the automatic syntactic analysis of real-world text. Syntactic analysis in this thesis denotes computation of in-depth syntactic structures that are grounded in syntactic theories like Head-Driven Phrase Structure Grammar (HPSG). Since syntactic structures provide essential components for computing meanings of natural language sentences, the ...
متن کاملParse disambiguation for a rich HPSG grammar
The fine-grained nature of the HPSG representations found in the Redwoods treebank raises novel issues in parse disambiguation relative to more traditional treebanks such as the Penn treebank, which have been the focus of most past work on probabilistic parsing (e.g., Charniak 1997; Collins 1997). The Redwoods treebank is much richer in the representations it makes available. Most similar to Pe...
متن کاملAcquiring an Ontology for a Fundamental Vocabulary
In this paper we describe the extraction of thesaurus information from parsed dictionary definition sentences. The main data for our experiments comes from Lexeed, a Japanese semantic dictionary, and the Hinoki treebank built on it. The dictionary is parsed using a head-driven phrase structure grammar of Japanese. Knowledge is extracted from the semantic representation (Minimal Recursion Semant...
متن کاملTowards an LFG parser for Polish: An exercise in parasitic grammar development
While it is possible to build a formal grammar manually from scratch or, going to another extreme, to derive it automatically from a treebank, the development of the LFG grammar of Polish presented in this paper is different from both of these methods as it relies on extensive reuse of existing language resources for Polish. LFG grammars minimally provide two levels of representation: constitue...
متن کامل